108 research outputs found
The mean and variance of phylogenetic diversity under rarefaction
Phylogenetic diversity (PD) depends on sampling intensity, which complicates
the comparison of PD between samples of different depth. One approach to
dealing with differing sample depth for a given diversity statistic is to
rarefy, which means to take a random subset of a given size of the original
sample. Exact analytical formulae for the mean and variance of species richness
under rarefaction have existed for some time but no such solution exists for
PD. We have derived exact formulae for the mean and variance of PD under
rarefaction. We show that these formulae are correct by comparing exact
solution mean and variance to that calculated by repeated random (Monte Carlo)
subsampling of a dataset of stem counts of woody shrubs of Toohey Forest,
Queensland, Australia. We also demonstrate the application of the method using
two examples: identifying hotspots of mammalian diversity in Australasian
ecoregions, and characterising the human vaginal microbiome. There is a very
high degree of correspondence between the analytical and random subsampling
methods for calculating mean and variance of PD under rarefaction, although the
Monte Carlo method requires a large number of random draws to converge on the
exact solution for the variance. Rarefaction of mammalian PD of ecoregions in
Australasia to a common standard of 25 species reveals very different rank
orderings of ecoregions, indicating quite different hotspots of diversity than
those obtained for unrarefied PD. The application of these methods to the
vaginal microbiome shows that a classical score used to quantify bacterial
vaginosis is correlated with the shape of the rarefaction curve. The analytical
formulae for the mean and variance of PD under rarefaction are both exact and
more efficient than repeated subsampling. Rarefaction of PD allows for many
applications where comparisons of samples of different depth is required.Comment: Final version to be published in Methods in Ecology and Evolutio
Likelihood-based inference of B-cell clonal families
The human immune system depends on a highly diverse collection of
antibody-making B cells. B cell receptor sequence diversity is generated by a
random recombination process called "rearrangement" forming progenitor B cells,
then a Darwinian process of lineage diversification and selection called
"affinity maturation." The resulting receptors can be sequenced in high
throughput for research and diagnostics. Such a collection of sequences
contains a mixture of various lineages, each of which may be quite numerous, or
may consist of only a single member. As a step to understanding the process and
result of this diversification, one may wish to reconstruct lineage membership,
i.e. to cluster sampled sequences according to which came from the same
rearrangement events. We call this clustering problem "clonal family
inference." In this paper we describe and validate a likelihood-based framework
for clonal family inference based on a multi-hidden Markov Model (multi-HMM)
framework for B cell receptor sequences. We describe an agglomerative algorithm
to find a maximum likelihood clustering, two approximate algorithms with
various trade-offs of speed versus accuracy, and a third, fast algorithm for
finding specific lineages. We show that under simulation these algorithms
greatly improve upon existing clonal family inference methods, and that they
also give significantly different clusters than previous methods when applied
to two real data sets
- …